Skip to content

Update HomeserverTestCase.get_success(...) and friends to drive async Rust (Tokio runtime/thread pool)#19871

Draft
MadLittleMods wants to merge 30 commits into
developfrom
madlittlemods/better-get-success
Draft

Update HomeserverTestCase.get_success(...) and friends to drive async Rust (Tokio runtime/thread pool)#19871
MadLittleMods wants to merge 30 commits into
developfrom
madlittlemods/better-get-success

Conversation

@MadLittleMods

@MadLittleMods MadLittleMods commented Jun 19, 2026

Copy link
Copy Markdown
Contributor

Update HomeserverTestCase.get_success(...) and friends to drive async Rust (Tokio runtime/thread pool)

Spawning from adding some more async Rust things in #19846 and noticing that we have an existing pattern to use instead of the custom till_deferred_has_result(...) that has crept in to a few files.

Alternative to #19867 spurred on by this comment from @erikjohnston

Does this slow down the entire test suite?

. Before After
trial (3.10, sqlite, all) 7m - 8m 35s TODO
trial (3.10, postgres, 14, all) 19m 53s TODO

Dev notes

#19394 (comment) and #19734 (comment) discuss why you sometimes need to self.reactor.advance(0) before you can actually self.reactor.advance(...) in some cases and reasoning for why pump(...) may have become a thing.

Todo

  • Remove till_deferred_has_result
  • Remove wait_on_thread

Pull Request Checklist

  • Pull request is based on the develop branch
  • Pull request includes a changelog file. The entry should:
    • Be a short description of your change which makes sense to users. "Fixed a bug that prevented receiving messages from other servers." instead of "Moved X method from EventStore to EventWorkerStore.".
    • Use markdown where necessary, mostly for code blocks.
    • End with either a period (.) or an exclamation mark (!).
    • Start with a capital letter.
    • Feel free to credit yourself, by adding a sentence "Contributed by @github_username." or "Contributed by [Your Name]." to the end of the entry.
  • Code style is correct (run the linters)

@MadLittleMods MadLittleMods changed the title Update get_success(...) and friends to drive async Rust (Tokio runtime/thread pool) Update HomeserverTestCase.get_success(...) and friends to drive async Rust (Tokio runtime/thread pool) Jun 19, 2026
event.room_version,
),
exc=LimitExceededError,
by=0.5,

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In a lot of cases, the by usage didn't seem necessary at all (test still passes) (no need to advance time in the reactor/clock)

Comment thread tests/unittest.py
# whole chain to completion.
self.reactor.pump([by] * 100)

def get_success(self, d: Awaitable[TV], by: float = 0.0) -> TV:

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed the by arg as it encourages bad behavior (people use it as a hammer to advance time without reasoning to make things work) and we arbitrarily advance time 100x this amount (imprecise).

I've instead updated the few places that we use this with a precise self.reactor.advance(...) as necessary.

Comment on lines +953 to +963
sync_d = ensureDeferred(
worker_presence_handler.user_syncing(
self.user_id, self.device_id, True, PresenceState.ONLINE
),
by=0.1,
)
)
# `user_syncing` proxies the presence write to the main process over an HTTP
# replication request. The request body is streamed by a `Cooperator` that uses
# the clock to schedule each chunk at a tiny *non-zero* delay (`_EPSILON`), so
# we need to actually advance the clock for it to fire.
self.reactor.advance(Duration(microseconds=1).as_secs())
self.get_success(sync_d)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the main pattern I'm recommending if you need to advance time by an non-zero increment. ensureDeferred works well but the name is a bit non-obvious to describe that we want to make the task run in the background on its own.

run_in_background(...) would also work but it's usage is a bit awkward. I guess we could use run_coroutine_in_background(...) instead 🤔

The difference between ensureDeferred(...) vs run_in_background(...)/run_coroutine_in_background(...) is all of the extra LoggingContext (log context) handling. It doesn't matter for tests though.

Comment on lines +81 to +86
# XXX: There can be a few already dispatched database queries (from normal
# background tasks in Synapse) and the threadless `ThreadPool` that we use in
# tests uses *untracked* clock calls to pass database results back so `shutdown`
# doesn't cancel those calls. This is a quirk of our test infrastructure
# (threadless `ThreadPool`) so this kind of "hack" is fine.
self.reactor.advance(0)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The explanation is slightly hand-wavey

…ocess_join_after_server_leaves_room`

`wait_for_background_updates` is not relevant
# Process the leave and join in one go.
dir_handler.update_user_directory = True
dir_handler.notify_new_event()
self.wait_for_background_updates()

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As far as I can tell self.wait_for_background_updates() is totally bogus here. I assume the mistake here was because notify_new_event(...) uses run_as_background_process(...) but that's a totally separate thing (background updates != background process)

This made the test work because it does wait_for_background_updates(...) did a get_success(..., by=0.1) which pumped and advanced the reactor/clock.

But we can replace it with something more precise.

Comment thread tests/unittest.py
# reactor to run (like `reactor.callFromThread(...)`)
self.reactor.advance(0)

def get_success(

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The primary change of this PR is changing get_success(...)/get_failure(...)` to be able to make progress on any awaitable that needs to do async Rust work.

The rest is just adjusting things because we removed the by arg (see other discussion) and stopped calling pump(...).

Comment thread tests/unittest.py
Comment on lines +867 to +873
# FIXME: Remove as this has the exact same semantics as `get_success()`. In
# https://github.com/matrix-org/synapse/pull/8402#discussion_r495992506 where it was
# introduced, it was claimed that "get_success fails the test if the deferred fails
# rather than raising, which I find a bit unintuitive." but `get_success()` actually
# does raise "@raise SynchronousTestCase.failureException : If the
# L{Deferred<twisted.internet.defer.Deferred>} has no result or has a failure
# result." at-least in today's world.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is accurate (follow-up PR)

Comment on lines -62 to -63
duration_ms = 10
await self.clock.sleep(Duration(milliseconds=count * duration_ms))

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of making the sleep duration dependent on the count (dynamic), I've just just made it static so we can be precise with our time advancements below

for callbable, args, kwargs in triggers:
callbable(*args, **kwargs)

def till_deferred_has_result(

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can remove till_deferred_has_result because get_success(...) covers it on its own now

…tionTestCase.test_first_get_event_cancelled`

Based on the same fix made in f22e7cd
(f22e7cd)
Comment thread tests/unittest.py
Comment on lines +767 to +777
# Checking `d.called` by itself is not sufficient by itself as this is possible:
#
# If you have a first `Deferred` `D1`, you can add a callback which returns
# another `Deferred` `D2`, and `D2` must then complete before any further
# callbacks on `D1` will execute (and later callbacks on `D1` get the *result*
# of `D2` rather than `D2` itself).
#
# So, `D1` might have `called=True` (as in, it has started running its
# callbacks), but any new callbacks added to `D1` won't get run until `D2`
# completes. Fortunately, we can detect this by checking `d.paused`.
while not d.called or d.paused:

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This language is the same explanation given in f22e7cd

You can reproduce the problem with this test: SYNAPSE_TEST_LOG_LEVEL=INFO poetry run trial tests.storage.databases.main.test_events_worker.GetEventCancellationTestCase.test_first_get_event_cancelled

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant